Use the Recognition component to create searchable PDFs comprised of recognized text and images. Optimal compression is chosen for each image based on OCR zone information recovered during recognition, producing smaller PDF files. Highly compressed PDFs are best suited for scanned documents.
Follow these steps to save a highly-compressed PDF:
- Load the image
- Recognize the image
- Export recognition data to PDF
- Save the PDF
Load the Image
The Loading Images section provides help on how to load an image. Alternatively, an image could be scanned into memory using the TWAIN and ISIS components.
Recognize the Image
Before an image can be used with the highly compressed PDF functionality, it must be recognized with the Recognition component. First use the IG_REC_image_import function to prepare an image for recognition. Then use IG_REC_image_recognize to generate its recognition data. More details are in the Optical Character Recognition section.
Zone data generated during recognition is used to choose optimal image compression.
Any changes to the recognized zone data—through either using manual zones or other zone options—may adversely affect the final PDF file size. |
When manually zoning an image, take care to mark any picture data with the IG_REC_WT_GRAPHIC zone type. |
Export Recognition Data to PDF
First create a new PDF document with IG_mpi_create and IG_PDF_doc_create.
Next, prepare an AT_REC_PDF_PAGE_OPTIONS structure to re-compress the source image and add invisible text into the PDF page:
• Set SegmentImage to TRUE.
• Set VisibleImage to TRUE.
• Set VisibleText to FALSE.
Then use the Recognition IG_REC_PDF_page_create function with each recognized page to append a new highly-compressed PDF page to the document.
Save the PDF
After the PDF document is created and pages created, use the function IG_mpi_file_save to save the PDF document to disk. Make sure to use IG_FORMAT_PDF for the nFormat parameter.